Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors
نویسندگان
چکیده
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. A significant part of the occupancy is due to the latency of accessing the directory, which is usually kept in DRAM memory. Most coherence controller designs that use protocol processors for executing the coherence protocol handlers use the data cache of the protocol processor for caching directory entries along with protocol handler data. Analogously, a fast Directory Cache (DC) can also be used by the hardwired coherence controller designs in order to minimize directory access time. However, the existing hardwired controllers do not use a directory cache. Moreover, the performance impact of caching directory entries has not been studied in the literature before. This paper studies the performance of directory caches using parallel applications from the SPLASH-2 suite. We demonstrate that using a directory cache can result in 40% or more improvement in the execution time of applications that are communication intensive. We also investigate in detail the various directory cache design parameters: cache size, cache line size, and associativity. Our experimental results show that the directory cachesize requirements grow sub-linearly with the increase in the application’s data set size. The results also show the performance advantage of multientry directory cache lines, as a result of spatial locality and the absence of sharing of directories. The impact of the associativity of the directory caches on performance is less than that of the size and the line size. Also, we find a clear linear relation between the directorycache miss ratio and the coherence controller occupancy, and between both measures and the execution time of the applications, which can help system architects evaluate the impact of directory cache (or coherence controller) designs on overall system performance.
منابع مشابه
A Versatile Directory Scheme(Dir2NB+L) and Its Implementation on BY91-1 Multiprocessors System
Cache coherence and synchronization between processors have been two critical issues in designing a shared memory multiprocessors system. From the perspective of hardware design, a directory based cache coherence protocol and lock mechanism are employed to prevent inconsistency of caches and warrant atomic memory accesses. The BY91-1 multiprocessors ejiciently integrate supports for cache coher...
متن کاملEvaluation of Design Alternatives for a Directory-Based Cache Coherence Protocol in Shared-Memory Multiprocessors
In shared-memory multiprocessors, caches are attached to the processors in order to reduce the memory access latency. To keep the memory consistent, a cache coherence protocol is needed. A well known approach is to record which caches have copies of a memory block in a directory and only notify the caches having a copy when a processor modifies the block. Such a protocol is called a directory-b...
متن کاملADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
ÐDirectories have been used to maintain cache coherency in shared memory multiprocessors with private caches. The traditional full map directory tracks the exact caching status for each shared memory block and is designed to be efficient and simple. Unfortunately, the inherent directory size explosion makes it unsuitable for large-scale multiprocessors. In this paper, we propose a new directory...
متن کاملA Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors
There are two important hurdles that restrict the scalability of directory-based shared-memory multiprocessors: the directory memory overhead and the long L2 miss latencies due to the indirection introduced by the accesses to directory information, usually stored in main memory. This work presents a lightweight directory architecture aimed at facing these two important problems. Our proposal ta...
متن کاملTwo proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors
In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others) contrasts with the much slower main memory. Unfortunately, directory-based protocols need to obtain the sharing status of every memory block before coherence actions can be performed. This info...
متن کامل